Evalutating Students’ Code as a Learning Product
Today’s layout
A bit about me…
“Supporting Data-Intensive Environmental Science Research: Data Science Skills for Scientific Practitioners of Statistics”
How has students’ code been analyzed?
A comparison of formula and tidyverse syntaxes (McNamara 2023)
Rafalski et al. (2019) extended these same ideas to compare students’ ability to write accurate code across three different R syntaxes: the tidyverse, base R, and the tilde style.
An alternative way to analyze students’ code
The Importance of Students’ Attention to Program State (Lewis 2012)
Attends to both the code produced by a student and their learning process
Pairs a student’s code with their debugging behavior side-by-side
These analyses of students’ code should not be few and far between. Students’ code poses a unique avenue for qualitative research in the teaching and learning of computing.
Warm-up (90 seconds)
What process would you expect a student to use to create a multivariate scatterplot, with different colors for different groups?
A framework for analyzing student’s code (Schulte 2008)
| Text Surface | Program Execution | Function | |
|---|---|---|---|
| Macrostructure | Understanding the overall structure of the program | Understanding the “algorithm” of the program | Understanding the goal / purpose of the program (in its context) |
| Relations | References between blocks, e.g., method calls, object creation | Sequence of method calls, object sequence diagrams | Understanding how sub-goals are related to goals, how function is achieved by subfunctions |
| Blocks | Regions of interest (ROI) that syntactically or semantically build a unit | Operation of a block, a method, or a ROI (as a sequence of statements) | Function of a block, may be seen as a sub-goal |
| Atoms | Language elements | Operation of a statement | Function of a statement, only understandable in context |
Coding student’s code
Descriptive code
“Filters a vector of values using extraction operator, based on an equality relation with a variable selected from dataframe using
$operator”
In-vivo code
“Uses
[ ]and==to filter vector, uses$to select variable”
Uncovering emergent themes
linearAnterior <- lm(PADataNoOutlier$Lipid ~ PADataNoOutlier$PSUA)
early <- subset(RPMA2Growth, StockYear < 2006)
Weight5 <- mean(RPMA2GrowthSub$Weight[RPMA2GrowthSub$Age == 5], na.rm = TRUE)
gas <- gas[!(substr(gas$sampleID,3,3) %in% c("b","c")), ]
obsD <- subset(gas, gas$carboy == "D")$N15_N2_Ar
lowerCIBound <- pMat[1:mlleIndex,1][which.min(abs(mlleCI+likelihoods[1:mlleIndex]))]Data wrangling
Statements of code whose purpose is to prepare a dataset for analysis and / or visualization
Sub-themes
An alternative direction
Practical considerations
How much code should I collect?
How do readers trust my analysis?
Trust comes from:
How could this be used?
Concept dependence
How does a student’s concept model of a dataset inform how they filter data?
(atoms; program execution)
Program environment
How do the visualizations produced by students who learn ggplot differ from those who learn “base” R?
(blocks; program execution)
Linguistic structure
How do students name objects they will use later?
(relationships; text)
Learning trajectory
How do students’ exploratory data analyses change over the duration of a course?
(macrostructure; function / purpose)
Why is this important for data science education?
Theobold et al. (2023)
How can we distinguish merely interesting learning from effective learning (Wiggins and McTighe 2005)?
Questions?